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ABSTRACT 

As part of a project formulating optimal rules for 
decision making in computer assisted instructional systems in which 
the computer is used as a decision support tool, an approach that 
simultaneously optimizes classification of students into two 
treatments, each followed by a mastery decision, is presented using 
the framework of Bayesian decision theory. The main advantages of 
handling the three decision points simultaneously compared with 
separate optimization of such decisions are more efficient use of 
data and the use of more realistic utility structures. Both optimal 
weak monotone and strong monotone rules are considered. The results 
are empirically illustrated using data for 17,259 students for the 
problem, well-known in The Netherlands, of selecting optimum 
coni inuation schools at the end of elementary school on the basis of 
achievement test scores. (Contains 2 tables, 1 figure, and 6 
references.) (SLD) 
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Abstract 



An approach that simultaneously optimizes classification of students into two 
treatments each followed by a mastery decision is presented using the framework 
of Bayesian decision theory. The main advantages of handling the three decision 
points simultaneously compared with separate optimization of such decisions are 
more efficient use of data and the use of more realistic utility structures. Both 
optimal weak monotone and strong monotone rules will be considered. The 
results are empirically illustrated using data of the well-known problem in the 
Netherlands of selecting optimal continuation schools on the basis of achievement 
test scores. 
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Introduction 

In the relatively short period of time that instructional programs in 
computer-aided instruction (CAI) systems have been under development, much 
has been learned about the construction of instructional materials. Unfortunately, 
corresponding progress does not seem to have been made on the matter of 
developing appropriate testing methods and decision procedures for use in such 
systems. An appropriate set of testing methods and making procedures would 
facilitate an efficient flow of students through a CAI system. 

In a typical individualized program the instruction is divided into 
comparatively small instructional treatments or modules. In addition, all modules 
are delimited by means of clear-cut learning objectives. In the case of an adaptive 
CAI system, at several points of tune decisions have to be made about how each 
individual student should proceed from one module to another. Such decisions 
mostly depend on the student* s results on a few achievement test items 
administered right after a module as well as his preceding (test) history in the 
system. 

The purpose of this research project is to formulate optimal rules for 
instructional decision making in CAI systems in which the computer can be used 
as a decision support tool. The successful implementation of a CAI system 
depends, in part, upon the availability of appropriate testing and dccfc'oti making 
procedures to guide the student through the system. For instance, if a student is 
not directed to an appropriate module, his motivation may be decreased due to 
not matching the instruction to his specific learning characteristics. Also, the 
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(expensive) computer time can be considerably reduced by using better 
instructional decisions in CAI systems. 

Instructional networks in CAI systems can be represented as 
combinations of four elen.entary test-based decisions, namely selection, mastery, 
placement, and classification decisions (van der Linden, 1990). To optimize such 
combinations of decision problems within a Bayesian decision-theoretic 
framework (e.g., Ferguson, 1967), two major approaches can be distinguished. 
First, each decision can be optimized separately maximizing the expected utility 
for the test data exclusively gathered for this individual decision. Second, all 
decisions can be optimized simultaneously maximizing the expected utility over 
all possible combinations of decision outcomes (Vos, 1991, 1993, 1994). This 
paper explores how rules for the simultaneous optimization of combinations of 
decisions can be found. 

As an example, one classification decision with two treatments each 
followed by a mastery decision are combined into a decision network (see Figure 
1). The simple classification-mastery decision problem may be important in 
classification of students in CAI systems with tracks at different levels followed 
by a mastery test at the end of each track. Other well-known examples are 
educational guidance situations where most promising schools must be identified, 
which will be considered in Lie empirical example later on. 



Insert Figure 1 about here 



Compstred with separate optimization of decisions, it is expected that two 
main advantages can be identified for a simultaneous approach. First, it is 
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expected that rules can be found that make more efficient use of the data in the 
decision network. Second, it is expected that more realistic utility structures can 
be handled in a simultaneous approach. 

The classification-mastery decision problem 

In classification, the decision problem consists of a choice among 
several alternative treatments to which students have to be assigned on the basis 
of their test scores. Prior to the treatments, all students are administered the same 
classification test and the success of each treatment is measured by its own 
criterion. Completion of each treauncnt is followed by a mastery test which the 
student may pass or fail. Performance on this test is used to decide whether or 
not the students have profited enough from a treatment to be dismissed and to 
proceed with a subsequent treatment 

In the following, we shall suppose that the test scores observed prior to 
the treatments are denoted by a random variable X. Each treaunent j is followed 
by a mastery test, with scores denoted by a random variable Yj (j«O v l). Let Tj 
represent the classical test theory true score underlying Yj. Furthermore, it is 
assumed that the classification of subjects into j treaunents yield a joint 
distribution fj (x,yj,tj) of X, Yj, and Tj. 

Rescaling of the criterion variables 

For technical reasons the observable criterion variables Yq en Yj will be 
rcscalcd such that they both take values on the same domain. As a result, for the 
realizations Vq and yj of the random variables Yq mid Yj, the indices 0 and 1 
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can be dropped in the remainder of this paper. This is because yQ and yj now 
represent mathematical variables with the saire domain. Of course, this does not 
mean that a subject does rtceive the same value for y if (s)he follows different 
treatments. 

On the other hand, the indices 0 and 1 will be maintained for Yq and Yj 
because they represent different random variables. Also, the indices 0 and 1 will 
be maintained in the associate density and cumulative distribution functions. 

Similarly, since T is defined as the expectation of Y according to 
classical test theory, the indices 0 and 1 will be dropped for the realizations tQ 
and tj of Tq and Tj whereas the indices will be maintained again for the random 
variables Tq and Tj as well as their associated density and cumulative 
distribution functions. 

In accordance with the foregoing all functions of y and t to be 
introduced below will be defined on the new scale. 

Weak monotone and strong monotone rules 

In the present paper, we restrict the range of all possible decision rules 
by considering only monotone rules; that is, rules using cutting scores. Let x c , 
y rv and t r : denote the cutting scores on the random variables X, Y*, and T:, 
respectively, where t ri is set in advance by the decision maker. The 
classification-mastery decision problem now consists of simultaneously setting 
cutting scores x and y ri such that, given the value of t ri , the expected utility is 
maximized (j=0,l). 

In general, the observed scores on the classification test may or may not 
be explicitly taken into account in setting cutting scores on the mastery test score 
variable Yi (j=0,l). For instance, it seems reasonable that students who are 
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assigned to treatment 1 with observed classification 5£ores equal to or just above 
x c must compensate their relatively low classification scores with higher scores 
on the mastery test Yj. To distinguish between cases where y c j is or is not 
allowed to depend on x, those rules will be denoted by weak monotone and 
strong monotone rules, respectively. Thus, for each x < x c and x > x c , the weak 
cutting scores on the mastery tests Yq and Yj have to be computed from some 
functions y c Q(x) ^ >cl( x )* respectively. 

Let aj h stand for the action either to retain (h=0) or advance (h=l) a 
student who is classified into treatment j (j=0,l), then for the decision network of 
Figure 1 the most general form of the decision rule is a weak rule 5 defined as: 



{(x,y):5(x,y)=aQQ} = Ax Bq(x) 

{(x,y):5(x,y)=a 01 } = Ax B Q c (x) (1) 
{(x,y):6(x,y)=a i0 } =A c xB,(x) 
((x-y^x^aul = A c xB 1 c (x), 

where A, A c , Bj(x), and Bj C (x) stand, respectively, for the sets of x and y values 
for which a student is classified into treatment 0, into treatment 1, retained in 
treatment j, and advanced in treatment j. Thus, a weak monotone rule 5 can be 
defined for our example as: 



5(X,Yj)= < 



*00 
"01 

a 10 
a ll 



for X < x c , Yq < y c Q(x) 
for X < x c , Yq > y c0 (x) 

for X > x c , Yj < y cl (x) 
for X > x c , Y | >y cI (x). 



(2) 
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Since we confine ourselves to monotone rules in this paper, we are to 
show that there are no nonmonotone rules with larger expected utility, or, 
equivalently, that the subclass of monotone rules constitutes an essentially 
complete class (e.g., Ferguson, 1967, p. 55). Conditions under which the subclass 
of weak monotone rules is essentially complete are given in Vos (1994). If these 
conditions are met, a weak monotone solution is said to exist. 

An additive threshold utility function 

A utility function Uj^(t) evaluates the consequences of taking action a^ 
while the true score of the student is t. In the present paper, it is assumed that the 
utility structure of the combined problem can be represented as an additive 
function of ihe following form: 

u 0h( t > ==w l u 0c (l > + w 2 u 0hmW (3) 
u lh (l > = w l u lc (t) + w 3 u lhmW 

where Uj c (t), Ujh m (0 represent the utility functions for the separate classification 
and mastery decisions under treaunent j, respectively, and wj, W2» and w^ 
represent nonnegative weights. Since utility is measured at least on an interval 
scale, assuming W2 = w^ (i.e., the utility functions for both mastery decisions are 
equally weighted), the weights in (3) can always be rescaled as follows: 

u jh (t) = wu jc (t) + [(l-w)/2]u jhm (t) (4) 

where the weight w should obey 0 < w < 1. 
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In the Introduction it was remarked that one of the main advantages of a 
simultaneous approach was that more realistic utility structures can be handled. 
This fact is nicely demonstrated by the additive structure of (4), in which utility 
functions defined on the ultimate criteria variables Tq and Tj can also be used in 
previous decision problems, namely the problem of classifying students into 
treatment 0 and treatment 1 . 

In the classification-mastery problem, the following well-known 
threshold utility functions (e.g., Hambleton & Novick, 1973) arc adopted for the 
separate classification and mastery decisions: 



u jc (t) = 



for T- < t c j 



forTj>t cj 



(5) 



u jhm(° = { 



a j00 



d; 



J01 



d: 



l jlO 



d; 



J" 



for h = 0, Tj < t c j 
for h = 1, Tj < t c j 



for h ^ 0, Tj > t c j 
for h = 1, Tj > t c j. 



(6) 



The choice of threshold utility functions imply that the * seriousness' of 
all possible consequences of uie decisions can be summarized by four and eight 
constants in (5) and (6), one for each of the four and eight possible decision 
outcomes, respectively. The utility parameters bj h , dj h g, and dj hl (j* h:= CU) can be 
empirically assessed using lottery methods (e.g., Hambleton & Novick, 1973). 
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Optimal weak monotone and SMMED rules 

For each of the four possible actions, inserting the additive threshold 
utility function from (3) - (6), the expected utility with respect to fj(x,y,t) can be 
calculated. Adding up these expected utilities yields the expected utility for the 
simultaneous approach, E[lJ Qim (A c ,B0 C (x),Bi C (x))I. In Vos (1994) it is indicated 
thot an upper bound to E[U sim (A c > BQ C (x),B 1 c (x))] is obtained if the sets Bq C (x), 
Bj C (x), and A c take the form (y|g(x,y) > 0}, {y|h(x,y) > 0}, and 
{x | k(x,BQ C (x),Bj C (x)) > 0}, respectively, with Bq C (x) and Bj C (x) appearing as 
integration regions in the function k(x,BQ C (x)J3 j C (x)). 

Optima] wc;ik monotone rules 

For weak monotone rules, the sets Bj C (x) and A c lake the form 
[y c j(x),»l and [x c ,«]> respectively. Assuming the monotonicity conditions for 
weak simultaneous rules are satisfied, it ther. follows that optimal weak monotone 
rules can be found for those values of v c q(x), y c \M and x c for which 
g(x,y cQ (x)) = 0, h(x,y cl (x)) = 0, and k(x c ,y c0 (x c ),y cl (x c )) = 0, respectively. 

Since g(x,y c0 (x)) = 0 and h(x,y cl (x)) = 0 hold for all x, and thus for x c , 
the optimal weak cutting score on the classification test can be found by solving 
g(x c ,y cQ (x c )) = 0, h(x,y cl (x c )) = 0, and k(x c ,y c0 U c ),y cl (x c )) = 0 simultaneously 
for x c , y c ()(x c ), and y c j(x c ). For each x < x c and x > x c , the optimal weak 
cutting scores on the mastery tests Yq and Yj can be obtained by solving 
g(x,y c() (x)) a 0 and h(x,y cl (x)) = 0 for y cQ (x) v c l (x) ' respectively. 
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SMMEU rules 

Since in educational testing one is accustomed to using strong cutting 
scores, optimal rules will also be derived within the subclass of strong monotone 
rules without bothering about monotonicity conditions. This type of rules will be 
termed SMMEU (strong monotone rules with maximum expected utility) rules. 

The set of SMMEU cutting scores, say x*, y c Q, and y *j, can be 
obtained by inserting A c = [x c ,°o] and Bj C (xO = [y c j,°°] into 
E[U sim (A c t BQ C (x),Bj C (x)], differentiating w.r.t. x c , y cQj and y ci , setting the 
resulting expressions equal to zero t and solving simultaneously for x c , y c Q, and 
y cl (Vos, 1994). 

The optimal weak and SMMEU cutting scores can now be computed 
from the systems of nonlinear equations to be solved. Assuming a trivariate 
normal distribution for fj(x,y,t), a computer program called NEWTON, available 
on request from the author, was written to calculate the cutting scores iteratively 
(Vos, 1994). For each x < x c and x > x c , the optimal weak cutting scores on the 
mastery tests under treatment 0 and 1, v c q(x) and y c j(x), were computed 
iteratively by solving g(x,y cQ (x)) = 0 and h(x,y cl (x)) = 0 for y cQ (x) and y c \W> 
respectively. These procedures were also implemented in NEWTON. In the 
program NEWTON only the utility parameters bj h , dj h Q, and dj h j, the weight w 
(i.e., the relative influence of the separate classification decision in %), and the 
cutting score t c j on the true score scale Tj have to be specified by the decision 
maker (j,h=0,I). 

It is important to notice that the weak montone approach actually 
provides us with some 'artificial intelligence* for setting optimal weak cutting 
scores. The more test data of each student comes available, the better the optimal 
weak cutting scores for each student can be set. In fact, the optimal weak cutting 
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scores on the classification test still have to be set for all students at the same 
point x c , while the optimal weak cutting scores on the mastery tests t y^Cx) and 
y^x), can be set by taking explicitly into account each student's observed score 
on the classification test. In other words, the program NEWTON operates as an 
Intelligent Tutoring System (ITS) in the sense of monitoring the student through 
the instructional network in such a way that optimal advantage is token of each 
student's preceding (test) history in the CAI system. 

An application to a real-life decision problem 

The numerical example concerns the assignment of pupils to appropriate 
continuation schools at the end of the elementary school (i.e., at grade 8), a 
problem that is well-known in the Dutch educational system. The Dutch National 
Institute of Educational Measurement (CITO) prepares annually an achievement 
test (Eindtoets Basisonderwijs), which is used by most elementary schools for 
this purpose. In addition, on the basis of a grade-point average, it is decided 
whether or not a pupil will finish the first year of secondary school j 
successfully. This means that the problem can be characterized as a classification- 
mastery decision. Test scores on the CITO achievement test as well as the grade 
point average range from 0-50. 

In the analyses reported here. Lower Vocational Education (LVE) and 
Lower General Education (LGE) were selected as treatments with 1333 and 
15926 pupils assigned to each of them, whereas LVE en LGE could be 
considered as treatment 0 (Mower 1 ) and 1 ('higher'), respectively. 
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Pupils were considered as having passed the first year of school 0 and 1 
successfully if they had mastered at least 52% and 54% of the total subject 
matter, respectively, at the end of the first year. Therefore, t C Q and t c j were fixed 
at 26 and 27, respectively. 

The necessary statistics to compute the optimal weak monotone and 
SMMEU rules were estimated using maximun likelihood estimates. The results of 
the computations are shown in Table 1. 

Insert Table 1 about here 



Results for the simultaneous approach 

Using tJhe program NEWTON, the SMMEU and set of weak cutting 
scores (* c .y c o(*c^ycl^ x c^ were com P ute( * f° r ^ree different values of the utility 
parameters as well as for w = 0.3, 0.6, and 0.9. The results are summarized in 
Table 2. 

Insert Table 2 about here 



The optimal weak monotone rules are given by x c , y c g( x ) f° r x< x c * y<;lW 
for x > x c . Using the program NEWTON, it appeared that both y c g( x ) and y c j(x) 
were very slowly decreasing in x for x < x c and x > x c , respectively. These 
patterns were in accordance with our expectations that students with classification 
scores far above or just below x c are sooner allowed to proceed with the next 
treatment than pupils with classification scores just above or far below x c . 

As can be seen from Table 2. and using the decreasing character of 
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y ni (x), increasing values of w resulted in higher optimal weak cutting scores on 
the classification test, whereas the optimal weak cutting scores on both mastery 
tests were hardly influenced by the value of w. This makes sense since one might 
expect that with increasing weight for uj c (t), classification into the 'higher* 
treatment becomes less likely. 

Optima] separate cutting scores 

In Vos (1994) it is indicated that optimal cutting scores for the separate 
classification and mastery decisions, say x c and y c j can easily be derived 
imposing certain restrictions on the expected utility for a simultaneous approach. 
The results are also summarized in Table 2. 

As can be seen from Table 2, in particular for low values of w, the 
optimal cutting scores for the separate classification decision were remarkably 
higher compared with those in the weak monotone model, implying that students 
were much sooner assigned to higher types of education in the weak monotone 
model. 

Furthermore, Table 2 shows that y c Q(x c ) and y c j(x c ) were somewhat 
higher compared to y c Q se p and y c j sep , respectively. This makes sense, because 
if students were sooner assigned to the 'higher* treatment in a weak monotone 
approach it seems reasonable that those students who were just classified into 
treatment 0 and 1 had to compensate their relatively low classification scores 
with higher optimal weak cutting scores on the mastery tests. The decreasing 
character of y c j(x) in x, however, implies that with increasing classification 
scores the optimal weak cutting scores on the mastery tests can be slowly 
decreased again. 
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Comparison of expected utilities 

In the Introduction it was remarked that one of the main advantages of a 
simultaneous approach was the expectation that rules making more efficient use 
of the data in the decision network could be found. As a consequence, one might 
expect an increase in expected utility compared with a separate approach. To 
investigate whether this expectation could be confirmed, the weighted sum of the 
expected utilities for the optimal separate rules was compared with the expected 
utilities for a simultaneous approach using a computer program called UTILITY, 
available on request from the author. The results are also depicted in Table 2. 

Table 2 indicates that, although the differences were rather small, the 
weak monotone approach yielded the largest expected utility for all three 
approaches for all utility structures. In particular, for a large weight for the utility 
of the classification decision, hardly any differences could be found. Though this 
result dois not contradict our predictions, we did have stronger expectations. 

Concluding remarks 

A final remark is appropriate. The models presented in this paper were 
applied to the problem of assigning students to optimal types of secondary 
education. However, the procedures advocated in this paper have a larger scope. 
For instance, in addition to the important application of deriving optimal rules for 
instructional decision making in CA1 systems, the simple classification-mastery 
decision problem may be useful in the area of psychotherapy in which patients 
have to be classified into the most appropriate therapy followed by a test, which 
has to be passed before they can be dismissed from the therapy. 
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Table 1 

Statistics Classification and Mastery Tests (X and Y:) 



Statistic 


X 


0 


Y 

Treatment 


1 


Mean 


34.324 


31.551 




29.621 


Standard Deviation 


7.971 


3.246 




2.208 


Reliability 




0.812 




0.803 


Correlation 


PO 


= 0.129 


Pi = 


0.365 
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